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Abstract. We give a suitable RI- Property under which recent results for trace regres- 
sion translate into strong risk bounds for multivariate regression. This pseudo-RIP is 
compatible with the setting n < p. 



1. Introduction 

1.1. Statistical framework. Multivariate regression deals with n observations of a T- 
dimensional vector 

Hi — -AqXi + Ej, i = 1, . . . , n 

where Aq is the transpose of a p x T matrix Aq. We have in mind that has a small 
(unknown) rank and the design Xi is non-random. Writing Y, X and E for the matrices 
with respective rows yf, xj and ef, the above equation translate into 

Y = XAo + E. 

Anderson [1] and Izenman [5] have introduced reduced-rank estimators 
At £ argmin ||y — Xyl|p, r = 0, . . . , min(p, T), 

A : rank(j4)<r 

where ||.|| is the Hilbert-Schmidt norm associated to the scalar product (.,.). The problem 
of selecting among the family of estimators |j4j., r = 0, . . . , min(p, r)| by minimizing the 
criterion 

Crit(r) = ||y - XArf + \ien{r)a'^ and Crit'(r) = log (\\Y - XArf') + pen'(r) 



has been investigated recently from a non-asymptotic point of view by Bunea et al. [3j 
and Giraud [4j. Both papers provide oracle bounds for the predictive risk 



R{A) = E 
with no assumption on the design X. 
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Multivariate regression corresponds to a special case of the trace regression model 

y, = {Zj,AQ) + i,, j = l,...,N, 
where {Zj,A()) = tr(ZjAo). Indeed, we have for alH G {1, . . . , n} and t G {1, . . . , T} 

Yit = {AQXi,et) + Eit 
= {xi^,Ao)+Eit, 

where {ei, . . . , e^} is the canonical basis of M"^. Many recent works [21I71E1E] have investi- 
gated trace regression with nuclear norm penalization. Translated in terms of multivariate 
regression, Nuclear-Norm-Penalized regression estimators are defined by 

(1) AAGargminJ ||y-X^f + AVcTfc(^)i, 

where cfi{A) > a2{A) > ... are the singular values of A. Several risk bounds have 
been obtained for the predictive risk of Ax and they all require the assumption (semi-RI 
Property) 

(2) Pll < n \\XA\\, for ah A E M^^^ 

for some positive //. In other words the smallest eigenvalue of X^X must be larger than 
1/ fj? > 0. This enforces the sample size n to be larger than the number p of parameters. 
This assumption on the design needed for Ax is thus very strong, in contrast with the 
reduced-rank estimator Af which requires no assumption on the design. 

1.2. Object of this note. In this note, we emphasize that the Assumption ([2]) coming 
from the general trace regression framework can be weaken for the multivariate regression 
framework. Under this (much) weaker assumption, we show that the analysis of Kolchinskii 
et al. [6] gives an oracle bound with leading constant 1 for the estimators Ax. 

2. Semi-RIP for multivariate regression 

The Condition ([2]) requires the sample size n to be larger than the number p of covariates. 
Is- it still possible to get an oracle bound on — when n is smaller than p ? 

The analysis of Theorem 12 in Bunea et al. [3] suggests that the Condition ([2]) only need 
to hold true for matrices A of rank at most twice the rank of ^40. Unfortunately, when 
the rank of Aq is positive this condition is still equivalent to require that the smallest 
eigenvalue of X'^ X is larger than l//i^ > 0. 

In the analysis of in Kolchinskii et al. [6j, the Condition ([2]) is needed for comparing 
\\Ax — A\\ to 11^^;^ — -^^11, see for example the Display (2.17) of [6]. We point out below, 
that this inequality needs not to hold for all matrices Ax and A, so that Condition ([2|) can 
be relaxed to handle cases where p > n. 
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Assumption 1. 

crrank(X)(^) > - > 
where cri{X) > o"2(X) > . . . are the singular values of X. 

The singular value (Jra.nk{x) (^) is always positive but can be arbitrary small. Assumption 1 
requires a positive lower bound on this singular value. 

2.1. Risk bound under Assumption 1. Write rg(X^) for the range of the linear op- 
erator X'^ and li^^f^xT) for the orthogonal projection onto the range of X'^ in W . Since 
we have the orthogonal decomposition = ker(X) -|-rg(X'^), we have XH^^^^xT)-^ = XA 
for any matrix A. In addition, ak{Ilj.g(^xT)^) < (^kiA) for any k and matrix A, so 

k k 

with strict inequality if Ilj.g(^xT)A / ^- As a consequence, we have Ilj.g(^xT)^x = so 
A\ is also a minimizer of 



|||y_XA||2 + Aj;afc(yl)|. 
where A := {A G MP^^ : rg(A) C rg(X^) } . 



(3) min 



Under Assumption 1, we have 

\\A\\ < n \\XA\\, for ah A e A. 
Theorem 1 of Kolchinskii et al. [6] then gives the upper bound 



\XAx-XAQf < inf ( 



|AA- AAof + ( ) n^X^T&nkiA) 



AeA 

for A > 2ai{X"^ E). Again, since XIlj.g(^xT)A = XA and rank(nj,g(j(^T)A) < rank(A), the 
infimum on the right hand side coincides with the infimum on the whole space M^^-^. We 
then have the following result. 

Theorem 1. Let A\ be defined by Then, under Assumption 1, for X > 2ai{X'^ E) 
we have 

|AAA-AAof < inf 1 1| Ayl - AAof + - /i^A\ank(A) 

X T I 2 

ll^'^'r}. 

k>r+l 



inf I ^ ak{XAof + 
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2.2. Case of Gaussian errors. The above statement is purely deterministic. In the case 
of Gaussian errors we have the following corollary. 

Corollary 1. Assume that the entries of E are i.i.d. with Gaussian M {0, a"^) distribution. 

Let K > 1 and set 

X = 2Kai{X) (\/T + V?) cr, with q = rank(X). 



Then, with probability larger than 1 — e i)^(T'+<?)/2 have 
\\XAx-XAof < inf l\\XA-XAof + 6K^^^(VT+^Ya\ank{A)\ 

(4) = inf I 2: a,{XAof + 6K'^^{VT + ^ya\ 

^ ^ k>r+l ^'^^ ' 



3. Discussion 

The Assumption 1, which requires that the smallest positive singular value of X is lower 
bounded, is much weaker than the Assumption ([2]). In particular, this condition is fully 
compatible with the setting where the sample size n is smaller than the number p of 
covariables. 

The inequality ([H) for the Nuclear Norm Penalized estimator suggests that the suitable 
" Rl-Property" for prediction in multivariate regression is 

Rl-Property : There exists rj € [1,+cxd[ such that 

1 < I < r], with q = rank(X). 



When this condition is met with r/ of reasonable size, the NNP-estimator achieves under 
the assumptions of Corollary 1, the oracle inequality 

\\XAx-XAof< inf l \\X A - X AqII^ + 6K^ ri"^ ( Vt + Tank(A)\ 

with probability larger than 1 — e~^^~^^^^'^~^'^^^^ . This inequality ensures that the NNP- 
estimator is adaptive rate-minimax. 
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